Effective statistical features for coding and non-coding DNA sequence classification for yeast, C. elegans and human

نویسندگان

  • Alan Wee-Chung Liew
  • Yonghui Wu
  • Hong Yan
  • Mengsu Yang
چکیده

This study performs a quantitative evaluation of the different coding features in terms of their information content for the classification of coding and non-coding regions for three species. Our study indicated that coding features that are effective for yeast or C. elegans are generally not very effective for human, which has a short average exon length. By performing a correlation analysis, we identified a subset of human coding features with high discriminative power, but complementary in their information content. For this subset, a classification accuracy of up to 90% was obtained using a simple kNN classifier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of Polymorphisms in Non-Coding Region of Human Mitochondrial DNA in 31 Iranian Hypertrophic Cardiomyopathy (HCM) Patients

The D-loop region is a hot spot for mitochondrial DNA (mtDNA) alterations, containing two hypervariable segments, HVS-I and HVS-II. In order to identify polymorphic sites and potential genetic background accounting for Hypertrophic CardioMyopathy (HCM) disease, the complete non-coding region of mtDNA from 31 unrelated HCM patients and 45 normal controls were sequenced. The sequences were aligne...

متن کامل

Bioinformatics Designing of 10-23 Deoxyribozyme against Coding Region of Beta-galactosidase Gene

Background: Deoxyribozymes (Dzs) can play a role as gene expression inhibitors at mRNA level. Among Dzs, the 10-23 deoxyribozyme has significant potentials for treatment of diseases. Designed Dz includes a catalytic core made of 15 deoxyribonucleotides and two binding arms consisted of 6-12 nucleotides for site specific binding to target RNA and hydrolysis. The enzyme has characteristic feature...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

روشی جدید برای تفکیک و طبقه‌بندی توالی‌های سرطانی و غیرسرطانی DNA با استفاده از الگوریتم‌های مبتنی بر LPC و SVD

The growing pace of cancer has encouraged researchers to deliberate several aspects of this malignant disease. Genetic-induced nature of cancer, heighten the importance of studying intra-cell components. This paper has been carried out with the aim of making some specific and unique features clear from those long DNA sequences by employing well-established DNA sequence analysis techniques. The ...

متن کامل

Non-coding stem-bulge RNAs are required for cell proliferation and embryonic development in C. elegans

Stem bulge RNAs (sbRNAs) are a family of small non-coding stem-loop RNAs present in Caenorhabditis elegans and other nematodes, the function of which is unknown. Here, we report the first functional characterisation of nematode sbRNAs. We demonstrate that sbRNAs from a range of nematode species are able to reconstitute the initiation of chromosomal DNA replication in the presence of replication...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • International journal of bioinformatics research and applications

دوره 1 2  شماره 

صفحات  -

تاریخ انتشار 2005